Due to rapid urbanizations and lifestyles, type 2 diabetes (T2D) has become one of the most concerning public health issues in China. People with T2D are at risk of multiple complications including blindness, cardiovascular diseases, and becoming more susceptible to infectious diseases. As a result, researchers began to notice a higher tuberculosis (TB) incidence in the T2D positive population. Therefore, we would like to explore the association between TB and T2D and what potential risk factors contribute to this comorbidity, and hopefully, provide recommendations to control or decrease the prevalence of T2D and TB in China.
The planned questions include what’s the incident rate of TB and its associates among adults with type 2 diabetes in Shanghai, China between 2004-2014, and geographical distribution of TB infection among these specific populations. We intend to explore which variables might affect the TB incidence in people with type 2 diabetes. The potential factors would conclude gender, sociodemographic factor (i.e. age at diagnosis of T2D), clinical parameters (BMI, fasting glucose), complications of T2D, choice of antidiabetic medication, mode of exercises and geographical location. Most of these questions were answered in our project. Over the course of the project, we found it’s necessary to further analyze their interaction, like analyzing exercises distribution and glucose management odds ratios in different districts. Also, we come up with a new question that we might further explore the multilevel analysis on TB cases if more data is accessible. For example, we can analyze the individual, street and district levels respectively to check whether there is any difference in influencing the number of TB cases.
load('./dm.Rdata')
df_raw = dm_base %>%
rename(subject_id = JiBenCID,
weight = tizhong,
height = ShenGao,
exercise_time = xiuxiansj) %>%
mutate(gender = ifelse(xingbie == 1, "Male", "Female"),
tb = ifelse(censer == 1, "No", "Yes"),
exercise = as.factor(xiuxiantl)) %>%
select(-xingbie, -censer, -xiuxiantl) %>%
janitor::clean_names()
levels(df_raw$exercise) <- list("Mild" = 1, "Medium" = 2,'heavy'=c(3,4))
df_combine = dm_base %>%
rename(
subject_id = JiBenCID,
glu_average = fastglu,
weight_initial = tizhong_1st,
weight_average = tizhong,
height = ShenGao,
glu_initial = kfxt_1st,
gender = xingbie,
district = GuanLiQX,
sys_pressure = Sbp,
dia_pressure = Dbp,
exercise_time = xiuxiansj,
exercise = xiuxiantl,
drug_insulin = insulin,
drug_oral_sulfo = sulfonylurea,
drug_oral_biguanide = biguanide,
drug_oral_glu = glu_inhib,
retina = reti,
skin = derm,
vessel = vesl,
nerve = neur,
kidney = neph,
depression = depress,
dmtime = quezhensj,
birthyear = birth_year,
birthmon = birth_mon,
dmdatayear = rucu_year,
dmdatamon = rucu_mon,
dmdataage = rucuage,
drug_order = fuyaoqk) %>%
mutate(
gender = factor(gender, labels = c("Male", "Female")),
district = as.factor(district),
glu_self_monitor = as.factor(celiangxtgl),
bmi_initial = weight_initial/(height/100)^2,
bmi_average = weight_average/(height/100)^2,
bmi_change = bmi_average - bmi_initial,
glu_change = glu_average - glu_initial,
tb = as.factor(ifelse(censer == 1, "No", "Yes")),
exercise = as.factor(exercise),
drug_oral_name = case_when(drug_oral_sulfo == "1" & drug_oral_biguanide == "0" & drug_oral_glu == "0" ~"sulfonylurea",
drug_oral_biguanide == "1" & drug_oral_sulfo == "0" & drug_oral_glu == "0" ~ "biguanide",
drug_oral_glu == "1" & drug_oral_biguanide == "0" & drug_oral_sulfo == "0" ~ "glu_inhib",
drug_oral_sulfo == "1" & drug_oral_biguanide == "1" & drug_oral_glu == "0" ~"sulfonylurea&biguanide",
drug_oral_biguanide == "1" & drug_oral_sulfo == "0" & drug_oral_glu == "1" ~ "biguanide&glu_inhib",
drug_oral_sulfo == "1" & drug_oral_biguanide == "0" & drug_oral_glu == "1" ~"sulfonylurea&glu_inhib",
drug_oral_sulfo == "1" & drug_oral_biguanide == "1" & drug_oral_glu == "1" ~"sulfonylurea&glu_inhib&biguanide",
TRUE ~ "NA"),
drug = drug_oral_biguanide + drug_oral_biguanide + drug_oral_glu + drug_insulin,
retina = as.numeric(retina),
skin = as.numeric(skin),
vessel = as.numeric(vessel),
nerve = as.numeric(nerve),
kidney = as.numeric(kidney),
complications = retina + skin + vessel + nerve + kidney + depression,
complications = as.factor(complications),
drug_order = as.factor(drug_order)
) %>%
select( -`_COL19`, -ZhiYe, -GuanLiJD, -ZhuZhiQX, -ZhuZhiJD, -JianCaQX, -JianCaJD)
levels(df_combine$exercise) <- list('1' = 1, '2' = 2, '3' = c(3,4))
levels(df_combine$district) <- list("Huangpu" = 310101, "Xuhui" = 310104, "Changning" = 310105, "Jingan" = 310106, "Putuo" = 310107, "Zhabei" = 310108, "Hongkou" = 310109, "Yangpu" = 310110, "Minhang" = 310112, "Baoshan" = 310113, "Pudong" = c(310115, 10119), "Jiading" = 310114, "Jinshan" = 310116, "Songjiang" = 310117, "Qingpu" = 310118, "Fengxian" = 310120, "Chongming" = 310230)
levels(df_combine$glu_self_monitor) <- list("Yes" = 1, "No" = 2:3)
save(df_combine,file = './data/df_combine.RData')
After the first step of analysis of the data set, we decided to mainly focus on four main risk factors: glucose level, drug usage level, complications level, and daily exercise level. When doing the analysis, for each risk factors, we looked for the different distributions of different levels by gender, by district, and by age. Because we are investigating the incidence of tuberculosis among Type II diabetes patients, we also analyzed the odds of tuberculosis in different levels of risk factors. We have explored histograms, density plot and odds ratio comparing plot to analyze different distributions. Our final result is shown in the K-M survival plot, which will be discussed later.
load('./data/df_combine.Rdata')
df_descrip = df_combine %>%
filter(district != "") %>%
mutate(tb = fct_recode(tb, '1'= 'Yes', '0'='No')) %>%
mutate(tb=as.character(tb),
tb=as.numeric(tb))
# person_years
summary(df_descrip$days)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 638 1485 1403 1970 4001
mean_follow_up_year = mean(df_descrip$days)/365
sum_follow_up_year = sum(df_descrip$days)/365
# tb summary
summary(df_descrip$tb)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000000 0.000000 0.000000 0.004607 0.000000 1.000000
sum_tb = sum(df_descrip$tb)
# all participants
nrow(df_descrip)
## [1] 170377
# overall incidence
overall_incidence = sum_tb/sum_follow_up_year
# male incidence
df_male = df_descrip %>% filter(gender == 'Male')
male_incidence = sum(df_male$tb)/(sum(df_male$days)/365)
# female incidence
df_female = df_descrip %>% filter(gender == 'Female')
female_incidence = sum(df_female$tb)/(sum(df_female$days)/365)
With an average following-up period of 3.8441142 year (range: 0 to 10.9616438 years, 785 TB cases were recorded among 170377 T2DM patients from 6.549486510^{5} person-years follow-up. The overall incident rate of TB was 119.8567243 per 100 000 person-years with 224.2070137 per 100 000 person-years for men, and 51.3446843 per 100 000 person-years for women.

We explored whether diabetes who regularly monitor glucose can reduce the risk of having TB in urban or rural areas. After obtaining the estimate and confidence interval of the adjusted odds ratio for having TB comparing diabetes who regularly monitor glucose to those who don’t do in urban or rural area are similar. For each district in Shanghai, we obtained the estimate and confidence interval of the adjusted odds ratio for having TB comparing diabetes who regularly monitor glucose to those who don’t monitor glucose regularly keeping all other variables fixed. The results show Huangpu district has the highest OR and Baoshan district has the lowest OR.

We found median and quartiles are higher in non-tb patients than in tb.
In the drug part analysis, we first analyze the drug usage distribution in different districts, different ages, and different genders. It turns out that by gender and by district, there are no significant changes. Furthermore, between different genders, males are more likely to have diabetes and taking more than two kinds of drugs, including insulin and oral drugs. The 95% percent confidence of odds of tuberculosis in different drugs levels overlap with each, which means that there is no significant difference in odds of tuberculosis in people who are taking different amounts of drugs.
df_complication <- df_combine %>%
mutate(retina = as.numeric(retina),
skin = as.numeric(skin),
vessel = as.numeric(vessel),
nerve = as.numeric(nerve),
kidney = as.numeric(kidney)) %>%
mutate(complications = retina + skin + vessel + nerve + kidney + depression) %>%
mutate(complications = as.factor(complications))
levels(df_complication$complications) <- list(none=0,one=1,more_than_two=c(2:6))
freqtable <- table(df_complication$complications)
df_com<- as.data.frame.table(freqtable) %>%
rename(complications = Var1,
Frequence = Freq)
knitr::kable(df_com)
| complications | Frequence |
|---|---|
| none | 144564 |
| one | 20841 |
| more_than_two | 4994 |
plot_com2<-ggplotly(ggplot(df_com, aes(x = complications, y = Frequence, fill=complications)) +
geom_histogram(stat = "identity", width = .6) +
labs(title="The Frequency of complications",
x = "How many complications for diabetes patients",
y = "Frequency") +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.title.y = element_text(face="bold", size=12),
axis.text.y = element_text(angle=0, vjust=0.5, size=10),
legend.title = element_text(size=12, face="bold"),
legend.text = element_text(size = 12, face = "bold")))
## Warning: Ignoring unknown parameters: binwidth, bins, pad
plot_com2
df_exercise = df_combine %>%
mutate(exercise = as.numeric(exercise),
total_exercise = exercise * exercise_time,
gender = as.factor(gender))
plot_exer = ggplot(df_exercise, aes(x = dmage, y =total_exercise, colour=dmage)) +
geom_histogram(stat = "identity", width = .6) +
labs(title="The average exercise vs age",
x = "age") +facet_wrap(~gender)
## Warning: Ignoring unknown parameters: binwidth, bins, pad
ggplotly(plot_exer)